Jinja + MarkupSafe adoption for AnnData._repr_html_ by katosh · Pull Request #9 · settylab/anndata

katosh · 2026-04-21T22:12:50Z

Jinja + MarkupSafe adoption for `AnnData._repr_html_`

Rebuilds the HTML-repr's rendering layer on top of Jinja2 + MarkupSafe while preserving every feature of the original implementation. There is no user-facing behavior change — all 26 scenarios in the visual inspection harness render identically modulo random-data noise.

Open this first — it's the fastest way to confirm the migration preserves every repr feature (nested AnnData, Raw, README modal, SVG tree, TreeData / SpatialData ecosystem examples, no-CSS / no-JS fallbacks, and the adversarial "Evil AnnData" case).

Visual inspection preview: https://gistpreview.github.io/?d4241e6eadd2bfd211f5dca90e2403bc

Why

The original repr assembled HTML via f-strings and a convention that "if your variable is user data, you must remember to call escape_html() before interpolating." That discipline worked but:

left the safety invariant to reviewer diligence,
made the data / presentation split implicit, and
mixed template logic with Python control flow.

@flying-sheep's review of #2236 argued — correctly — that Jinja's autoescape-by-default and MarkupSafe's typed trust boundary give those guarantees for free. This PR adopts that position.

What changed, in two sentences

Rendering pipeline: Python code now builds structured context dicts and feeds them to Jinja templates. Autoescape is on (select_autoescape(default=True, default_for_string=True)); every plain str is escaped at interpolation, every markupsafe.Markup passes through verbatim.
Type contract: FormattedOutput fields carrying HTML are typed Markup | None (renamed from *_html to *_markup — the fields are Markup, not raw strings). Extension packages producing HTML must wrap at the formatter boundary: FormattedOutput(preview_markup=Markup(obj._repr_html_())).

Architecture at a glance

anndata/_repr/
├── environment.py      Jinja Environment + autoescape + NUL-scrub finalize hook
├── templates/
│   ├── anndata.j2      outer repr frame
│   ├── section.j2      <details>/<summary> section frame (reused by error/unknown)
│   ├── entry.j2        entry row (name/type/preview cells + expandable variant)
│   ├── _macros.j2      reusable macros (badge, copy_button, muted_span, …)
│   ├── header.j2       top header (type + shape + badges + search)
│   ├── footer.j2       version + memory
│   ├── index_preview.j2 obs_names / var_names preview
│   ├── hints.j2        no-CSS / no-JS hint block
│   ├── error_entry.j2  section-level error placeholder
│   ├── raw_section.j2  Raw single-row section frame
│   ├── raw_repr.j2     inner body of an expanded Raw row
│   ├── x_entry.j2      X attribute row
│   └── max_depth_indicator.j2  depth-limit placeholder
├── registry.py         FormattedOutput (now with Markup | None fields)
├── components.py       thin Python wrappers that call macros
├── formatters.py       built-in TypeFormatters (return FormattedOutput)
├── sections.py         per-section renderers (obs, var, uns, obsm, varm, raw, unknown)
├── core.py             render_section, render_formatted_entry, render_x_entry
├── html.py             generate_repr_html (orchestrator)
└── utils.py            format_number, format_memory_size, format_index_preview, …

Every rendering entry point returns Markup. Internal composition uses Markup("\n").join(...) / Markup.format() or macro calls — never raw str.

Public API

`FormattedOutput` — renamed fields

Before (raw str)	After (`Markup \| None`)	Why
`preview_html`	`preview_markup`	Type annotation actually matches the contract
`type_html`	`type_markup`	"
`expanded_html`	`expanded_markup`	"

Plain-text siblings (type_name, preview, tooltip, error) keep their names — they remain autoescaped str.

Ecosystem formatters

Register a TypeFormatter or SectionFormatter exactly as before. The only authoring change: wrap your HTML output in Markup(...) at the boundary:

from markupsafe import Markup
from anndata._repr import register_formatter, TypeFormatter, FormattedOutput

@register_formatter
class MyArrayFormatter(TypeFormatter):
    def can_format(self, obj, context):
        return isinstance(obj, MyArrayType)

    def format(self, obj, context):
        return FormattedOutput(
            type_name=f"MyArray {obj.shape}",
            css_class="anndata-dtype--myarray",
            # Build custom HTML with Markup.format: each non-Markup arg is
            # autoescaped by MarkupSafe. Never use f-string interpolation
            # inside Markup(...) — that bypasses autoescape.
            preview_markup=Markup(
                '<span class="anndata-text--muted">({} items)</span>'
            ).format(obj.n_items),
        )

If the preview doesn't need custom HTML, prefer plain text (autoescaped end-to-end):

preview=f"({obj.n_items} items)"

If your extension already has an _repr_html_() from another package, wrap it at the boundary:

preview_markup=Markup(obj._repr_html_())

Three valid idioms, in order of preference:

preview=<str> — plain text, autoescaped. Simplest.
preview_markup=Markup('<tag>{}</tag>').format(value) — standard MarkupSafe pattern. Use when you need custom HTML.
preview_markup=get_macros().<macro>(value) — invoke a Jinja macro directly; benefits from the template engine's NUL-scrub finalize hook.

Never write Markup(f'...{value}...') — the f-string interpolates before Markup sees it, bypassing autoescape.

Helper modules

All render_* helpers (badges, entry cells, section scaffold, nested content) return Markup. All macros in _macros.j2 are callable from Python via get_macros() (see environment.py) for extension packages that want to compose repr pieces without going through the templates directly.

Safety posture

What's enforced

Autoescape on every template interpolation — any str passed into {{ … }} gets HTML-escaped. The only way untrusted data can appear verbatim is if it's wrapped as Markup, which is an explicit trust claim.
TestMarkupAutoescapeContract (new, in tests/repr/test_repr_robustness.py) verifies the three *_markup fields correctly escape bare-str contract violations and pass Markup-typed values through verbatim.
TestEscapingCoverage (pre-existing) verifies every user-data insertion point (obs / var / uns keys and values, category values, DataFrame columns, README content) gets HTML-escaped.
_container_id is validated against ^[A-Za-z][A-Za-z0-9_-]*$ before being interpolated into the <script> block — closes the defense-in-depth gap that the auto-generated UUID path avoided.

Known MarkupSafe caveat: NUL bytes

MarkupSafe's escape() / Markup.format() intentionally do not scrub NUL bytes (\x00). NUL is not HTML-significant per the HTML5 spec, but it can truncate attribute values in some parsers and generally produces ugly output.

How anndata handles it:

Jinja template path (the bulk of rendering): a finalize hook on the Environment replaces NULs in every str interpolation with U+FFFD. Every template-rendered attribute and text node is NUL-safe.
Macro path for extension authors: get_macros().<macro>(user_data) renders through the same engine, so it picks up the finalize hook automatically — this is the recommended pattern when custom HTML must embed user data.
Targeted internal sites: format_index_preview() and _build_readme_icon() (the only Python-side sites that touch potentially-NUL user data) scrub explicitly via .replace("\x00", "\ufffd").

We deliberately did not introduce a custom safe_format helper. The MarkupSafe API is the common vocabulary in the Python ecosystem; shadowing it with a bespoke wrapper would add something to learn for marginal benefit. Extension authors needing NUL safety for custom HTML can either call a macro via get_macros() or scrub inline — both are documented. The trust boundary in FormattedOutput.*_markup is about XSS (handled by autoescape), not NUL hygiene.

Dependencies

Adds two direct runtime dependencies (both MIT-licensed, widely packaged):

jinja2>=3.1
markupsafe>=3.0

Jinja depends on MarkupSafe, so in practice only one new install. No other dependency changes.

Metrics

HTML tags in Python (excluding docstring examples and the <pre> repr_html_enabled=False fallback):
- sections.py: 41 → ~1
- html.py: 22 → ~1 (the <pre> fallback)
- components.py: 35 → ~0
- core.py: 11 → ~0
escape_html call sites: 78 → 0 (helper removed entirely).
Templates: 0 → 13.

What this PR does not do

Does not add new user-facing features. Every displayed element, badge, truncation, color dot, SVG tree, nested AnnData view, README modal, search box, keyboard shortcut, and CSS rule is preserved.
Does not change the repr's CSS or JavaScript. Those files are untouched; only how the HTML around them is assembled has changed.
Does not require any ecosystem package to update unless they were using preview_html / type_html / expanded_html directly — in which case the field renames (*_markup) are the only change, and their HTML must now be Markup-wrapped at the boundary.

Routes the top-level repr through a single autoescape-enabled Jinja template and wraps existing formatter-produced HTML fragments in markupsafe.Markup at the boundary. Formatter internals (formatters.py, registry.py, components.py, sections.py, core.py) are untouched. The safety contract at the outer template: - plain-str values (container_id, depth, style) are autoescaped by default - Markup-wrapped fragments (header, sections, css, js, hints) pass through Adds jinja2>=3.1 and markupsafe>=3.0 to dependencies. Adds a minimal Environment module and one outer anndata.j2 template. The existing tests/visual_inspect_repr_html.py visual harness runs cleanly against this branch and produces the full 26-scenario comparison artifact. Repr test suite: 614 passed, 1 skipped — zero regressions.

Replace the f-string-assembled section frame in ``core.py`` with ``templates/section.j2``. One template covers both the normal ``<details>``-with-entries shape and the empty-state placeholder. - ``render_section`` now renders through the template and returns ``Markup``. - ``render_empty_section`` delegates to ``render_section(n_items=0, …)``. - ``render_truncation_indicator`` now returns ``Markup``. - Constants used inside templates (``CSS_TEXT_ERROR``, ``CSS_TEXT_MUTED``, ``NOT_SERIALIZABLE_MSG``, ``STYLE_HIDDEN``) are exposed as environment globals so templates can reference them symbolically. Transition: ``render_section(entries_html=…)`` accepts ``str`` or ``Markup``. Bare ``str`` is wrapped in ``Markup`` at the boundary so existing callers that still produce raw HTML fragments are preserved. Entry-level rendering will migrate in a follow-up; at that point internal callers will pass ``Markup`` directly.

Adds templates/_macros.j2 (badge, copy_button, muted_span, warning_icon, wrap_button) and templates/entry.j2 for one-shot row rendering. render_formatted_entry now returns Markup via entry.j2 instead of assembling sub-cells in Python. Small component helpers (render_badge, render_copy_button, render_muted_span, render_warning_icon, wrap-button helpers) delegate to the new macros and return Markup. render_header_badges composes its parts via Markup.join. environment.get_env() now uses a finalize callback to scrub NUL bytes from str values before autoescape, preserving the scrubbing previously done in utils.escape_html on the Python render path. Public API is stable; return types tightened from str to Markup (Markup subclasses str, so callers using string ops still work).

Section-level HTML assemblers (_render_unknown_sections, _render_error_entry, _render_raw_section, and the per-attr section renderers) now return Markup. Internal "\n".join(parts) stays but the result is wrapped at the return boundary. Return type annotations updated from str to Markup. No behavioral change — existing escape_html discipline is preserved; the Markup wrap makes the trust claim explicit and unblocks the phase-C finalization that removes the transitional str→Markup wrap in render_formatted_entry.

TypeFormatters that produce preview_html / expanded_html / type_html now return Markup at construction (not str). The dataclass types were already tightened in phase B; this tightens the values. No behavioral change — each fragment already used escape_html on user data and is safe HTML; the Markup wrap makes the trust claim explicit and allows phase C finalization to remove the transitional str→Markup wrap currently applied in render_formatted_entry.

…phase C) FormattedOutput.preview_html / type_html / expanded_html are now typed ``Markup | None`` (was ``str | None``). Component helpers that were still returning ``str`` — render_entry_row_open, render_search_box, render_nested_content, render_name_cell, render_entry_type_cell, render_entry_preview_cell — now return ``Markup``. Their composition sites use ``Markup("").join(...)`` so the return value is a real ``Markup`` instance rather than a plain string that happens to contain HTML. TypeCellConfig.type_html is also typed ``Markup | None``. No behavioral change: every site already used escape_html() around user data. Tightening the types makes the trust boundary enforceable at the annotation level and removes the ambiguity of "is this str safe HTML or not?" Paired with the formatter/section wraps in 95ddde9 and 9a7f273, this lets the next commit remove the transitional str→Markup wrap in render_formatted_entry.

…tion) Phase B added an implicit Markup wrap in render_formatted_entry so that formatters still returning str preview_html / type_html / expanded_html could flow through entry.j2 without being autoescaped. With all internal formatters (95ddde9), sections (9a7f273), and components (42f655f) now returning Markup, the wrap is dead weight — remove it. Also wraps the two fallback-formatter preview_html f-strings in registry.py in Markup(), and updates the module docstring example to show the Markup(...) idiom at the formatter boundary. FormattedOutput.preview_html / type_html / expanded_html are now Markup-typed end-to-end: every internal producer returns Markup, the dataclass stores Markup, and entry.j2 passes it through autoescape verbatim. Extension packages producing HTML must now wrap at the formatter boundary (documented in the module docstring).

Post-phase-C, FormattedOutput.preview_html / expanded_html are typed Markup | None and entry.j2 autoescapes bare str. The ecosystem examples in tests/visual_inspect_repr_html.py were still passing raw str for: - TreeData ObstSectionFormatter / VartSectionFormatter: expanded_html was the SVG tree string from _render_tree_svg - TreeMetadataSectionFormatter.render_html: returned raw str - MuData ModSectionFormatter: expanded_html was generate_repr_html output - SpatialData: preview_html for images / labels / points / shapes, expanded_html for the nested AnnData in tables - Uns TypeFormatter custom preview - Ontology extensibility TypeFormatter preview_html Each site now wraps its HTML in markupsafe.Markup so the trust claim is explicit — matches the public contract now that the type tightening is end-to-end. Bug this fixes: "gene_ontology DiGraph (54 nodes, 45 leaves)" placeholder and the nested AnnData inside SpatialData tables were being rendered as escaped HTML source text instead of their intended markup.

The Jinja migration typed these fields as ``markupsafe.Markup | None``; the ``_html`` suffix was a leftover from the plain-``str`` era and misdescribed their contract — a bare string flowing into a field named ``preview_html`` looked fine to ecosystem authors but was silently autoescaped at the template boundary. Renames (on FormattedOutput, TypeCellConfig, entry.j2, and every call site and docstring): - preview_html → preview_markup - type_html → type_markup - expanded_html → expanded_markup - append_type_html → append_type_markup (bool flag that mirrors the renamed field) - index_preview_html (local in html.py) → index_preview_markup The plain-text siblings (``preview``, ``type_name``, ``tooltip``) keep their names — the bare-name / ``_markup``-suffix pair now cleanly reflects the type contract: autoescaped str vs trusted Markup. Also wraps every bare-string assignment to the renamed fields in ``Markup(...)``: four docstring examples in __init__.py / registry.py / core.py, and three test assignments in test_repr_registry.py / test_repr_formatters.py that were previously passing through the autoescape path and being rendered as escaped HTML text. No backwards-compat shims: nothing has been released.

Stragglers from the phase-C review that kept a few str return types and transitional wraps alive: - ``core.py::render_x_entry`` → returns ``Markup`` (was ``str``); parts list is typed ``list[Markup]`` and joined with ``Markup("\n").join``. - ``html.py``: ``_render_header`` / ``_render_footer`` / ``_render_index_preview`` / ``_render_max_depth_indicator`` → ``Markup``. Their callers in ``generate_repr_html`` drop the transitional ``Markup(...)`` wraps. - ``html.py::_render_all_sections`` → ``list[Markup]``; drop the ``[Markup(s) for s in …]`` comprehension at the caller. - ``html.py::_render_section`` / ``_render_custom_section`` → ``Markup`` (the latter wraps ``formatter.render_html(…)`` output so extension packages can still return plain ``str``). - ``html.py::generate_repr_html`` → ``Markup``. This removes a redundant ``Markup(Markup(...))`` double-wrap in ``AnnDataFormatter``'s nested repr construction. - ``formatters.py::AnnDataFormatter.format`` — drop the inner redundant ``Markup(...)`` now that ``generate_repr_html`` returns ``Markup``. - ``_render_footer`` uses ``Markup('<tag>{}</tag>').format(value)`` for safe plain-text interpolations (version string, memory size) instead of redundant ``escape_html`` on strings that can't contain HTML chars. - ``html.py``: moved the side-effect ``from . import formatters`` next to the other first-party imports (was below ``TYPE_CHECKING``). Also wraps two bare-string assignments surfaced by the rename: - ``tests/repr/test_repr_registry.py:550`` — ``preview_markup=f'…'`` → ``Markup(f'…')`` - ``tests/repr/test_repr_formatters.py:841`` — ``expanded_markup=tree_html`` where ``tree_html`` was a bare triple-quoted string → ``Markup(...)`` Drops "POC" / "middle-ground" language from ``anndata.j2`` and ``core.py`` now that the migration has landed.

…h B) Internal callers in sections.py (3 sites) and html.py (1 site) switch from ``"\n".join(rows)`` to ``Markup("\n").join(rows)``. With every caller now producing ``Markup``, ``render_section``'s signature tightens to ``entries: Markup`` (was ``str | Markup``) and the transitional implicit-wrap comment in ``core.py:108-111`` is gone. ``render_empty_section`` feeds ``Markup("")`` for the same reason. Docstring examples in ``core.py`` and ``__init__.py`` updated to show the new idiom.

…call sites Replace the `Markup(f'...{escape_html(x)}...')` pattern with the idiomatic `Markup('...{}...').format(x)` — MarkupSafe's `.format()` autoescapes non-Markup args, so the manual escape_html wrapping is redundant. Treating the template string as trusted HTML and letting .format() escape user data is less error prone and removes a hand-rolled escape boundary at every call site. Migrated ~27 call sites across: - components.py: row_open, search_box, name_cell, category_list, type_cell - core.py: x_entry (error + type), formatted_entry error preview - formatters.py: DataFrame columns preview, color swatches - html.py: header type/filepath/lazy filepath, README icon, disabled fallback - registry.py: unknown-type error and warning previews - sections.py: unknown sections type cell, error entry `escape_html` definition retained in utils.py (remains exported from anndata._repr as a public helper). Usage in src/anndata/_repr/ drops from 27 call sites to 0 (outside utils.py's own internal use). Side improvements along the way: - registry.py: fixed a latent bug where the unknown-type warning preview built a plain str instead of a Markup (`preview_markup = f'...'`). Now a Markup, matching the field's type. - html.py: the `repr_html_enabled=False` fallback now returns a Markup instead of a plain str, matching the function's declared return type. All 614 repr tests pass; pre-commit clean.

Moves all entry-row cell markup into `_macros.j2` so the Jinja templates are the single source of truth. Python helpers in `components.py` and `core.py` become thin wrappers that call the macros via `.module`, so the public signatures and return types (Markup) don't change. Macros added to `_macros.j2`: - `name_cell(entry_key)` - `type_cell(type_name, css_class, type_markup=None, tooltip='', all_warnings=None, is_not_serializable=false, has_columns_list=false, has_categories_list=false, append_type_markup=false)` — takes every input as an explicit parameter instead of reading outer template scope - `preview_cell(preview_markup=None, preview_text=None)` - `row_open(key, dtype, css_class, has_expandable_content=false)` — caller builds the space-joined class string - `nested_content(html_content)` - `truncation_indicator(remaining)` - `category_list(items, total_hidden=0)` — `items` is a sequence of `(label, safe_color_or_none)` pairs so Python keeps ownership of `sanitize_css_color`; total_hidden is pre-computed by the wrapper `entry.j2` now just imports the macros and dispatches the row open / three cells / close — no more local macro definitions. HTML-tag count in `components.py` drops from 29 to 10 (remaining tags are inside `render_search_box`, which is out of scope for C1, plus docstring examples). Tests: 614 passed, 1 skipped; visual inspection unchanged.

…nts) (C3) Move the remaining orchestrator-layer HTML in `_repr/html.py` into Jinja templates. Python keeps the structural logic (badge construction, README truncation, backing-info lookup, memory formatting); templates now own all frame HTML. New templates in `_repr/templates/`: - `header.j2` — `<div class="anndata-header">` with type/shape, an ordered list of pre-rendered `extras` (badges + filepath spans + README icon), and an optional search box. - `footer.j2` — `<div class="anndata-footer">` with version + optional memory string. - `index_preview.j2` — two-line obs_names / var_names preview. - `hints.j2` — static no-CSS / no-JS hint block. - `max_depth_indicator.j2` — single-line depth-limit placeholder. `_render_header`, `_render_footer`, `_render_index_preview`, `_render_max_depth_indicator` now build a context dict and call the matching template. A new `_render_hints()` helper replaces the inline hints Markup in `generate_repr_html`. README-icon Markup construction is factored into `_build_readme_icon` (kept in Python so the truncation logic doesn't leak into templates). Visible HTML-tag constructs in `html.py` drop from 19 to 5 (the remainder: a `<pre>` fallback when repr is disabled, two tiny filepath `<span>` wrappers that pair with badges, and two strings inside docstrings/comments). Tests: `614 passed, 1 skipped` in `tests/repr/`.

Moves HTML scaffolding for three section-level renderers in sections.py out of Python string concatenation and into Jinja templates: - _render_unknown_sections now reuses section.j2 (extended with an optional extra_classes parameter) so the "other" section drops into the same <details>/<summary> frame as every other section while still getting its anndata-sec-unknown class. Per-row type/preview cells are built with the existing components (TypeCellConfig, render_name_cell, render_entry_preview_cell). - _render_error_entry now renders through a new error_entry.j2. This section doesn't fit section.j2's entries-grid shape (it holds a single red error message, not an entries list), so a dedicated template is clearer than overloading section.j2. - _render_raw_section now renders its outer frame through a new raw_section.j2. It's a single-row "anndata-sec" wrapper (not the normal anndata-section <details> frame), so it also gets its own template. The row itself is still assembled from components.py. - _generate_raw_repr_html (the body rendered inside the expanded Raw row) now renders through a new raw_repr.j2, which mirrors anndata.j2's shape but drops the sections Raw doesn't have. Also: - core.render_section grows an extra_classes keyword used by the unknown-section path. - _safe_index_preview extracted from the inline try/except ladder in _generate_raw_repr_html so the template just receives Markup|None. HTML tag count in sections.py drops from 30 to effectively 1 (only <div class="anndata-entry__nested-anndata"> remains, wrapping the nested Raw repr inside the expandable entry; see comment below). Tests: 614 passed, 1 skipped. No behavior change intended.

New class ``TestMarkupAutoescapeContract`` in tests/repr/test_repr_robustness.py (4 tests): - Three negative tests: an extension-style TypeFormatter returns ``FormattedOutput(preview_markup=<bare str with <script>>)`` / same for type_markup and expanded_markup. All three verify the script tag comes out as ``<script>...</script>`` — Jinja autoescape catches the contract violation, so a future regression that loosens the type back to ``str`` (or a template change that disables autoescape) can't silently land. - One positive control: a correctly-wrapped ``Markup(html)`` value flows through verbatim. Release note for PR scverse#2236 updated to mention the Jinja/MarkupSafe dependency and the ``*_markup`` field naming.

- ``section.j2`` no longer hardcodes ``'(empty)'`` when ``n_items == 0``; ``render_section`` sets the default ``count_str`` based on n_items so callers that pass an explicit ``count_str`` (e.g. ``"(5 columns)"``) are respected even for empty sections. - New ``filepath_span(path, style='')`` macro in ``_macros.j2`` with a Python wrapper ``render_filepath_span`` in components.py. ``html.py`` uses it for the backed/lazy filepath spans — drops the last two ``<span>`` f-strings in the orchestrator (html.py HTML-tag count goes from 5 to 2; the residual 2 are false-positives in a docstring and the ``<pre>`` repr_html_enabled=False fallback). - ``format_index_preview`` now returns ``Markup`` (was ``str`` that happened to contain escaped HTML — a latent double-escape hazard). Items are joined with ``Markup(", ").join(...)`` for autoescape. The ``Markup(format_index_preview(...))`` re-wraps in ``_render_index_preview`` are dropped. - Orchestrator templates (``header.j2``, ``footer.j2``, ``index_preview.j2``) reindented to the 2-space convention used by every other block-level template; in-line/macro files in ``_macros.j2`` and ``entry.j2`` stay compact on purpose.

… strengthen NUL test (F3) - validate caller-supplied _container_id against ^[A-Za-z][A-Za-z0-9_-]*$; raise ValueError on violation (auto-UUID path unchanged) - add TestMarkupAutoescapeContract.test_section_formatter_render_html_is_trusted pinning the ecosystem-extension trust contract - strengthen test_unicode_in_readme: assert NUL not in rendered HTML

escape_html removal (14 sites): - Definition + html import in src/anndata/_repr/utils.py - Public export in src/anndata/_repr/__init__.py (import + __all__ entry) - test_escape_html unit test in tests/repr/test_repr_utils.py - escape_html assertion in tests/repr/test_repr_core.py - Extension example in tests/test_repr.py (migrated to Markup.format()) - 8 ecosystem example sites in tests/visual_inspect_repr_html.py (imports, docstring, TreeMetadata/MockSpatialData/ontology formatters all migrated to Markup(...).format(...) pattern) _macros() dedup: - Moved to src/anndata/_repr/environment.py as get_macros() - Dropped local helpers + cache/get_env imports in components.py and core.py - Updated 14 call sites (12 in components.py, 2 in core.py) __all__ audit (src/anndata/_repr/__init__.py): - Removed STYLE_HIDDEN, NOT_SERIALIZABLE_MSG, DOCS_BASE_URL, get_section_doc_url from __all__ and their now-unused imports. No external references in tests/ or docs/.

Replaces scattered ``Markup('<tag>{}</tag>').format(...)`` patterns with Jinja macros where the template owns the HTML and autoescapes the variables. Python call sites keep a single ``Markup(_macros().<name>(...))`` wrap around trusted macro output. Macros added to ``_macros.j2``: - ``error_preview(message)`` / ``warning_preview(message)`` / ``muted_error_span(error_msg)``: preview-column status spans. - ``columns_preview(columns)``: DataFrame column list in obsm/varm. - ``color_swatch(color, label, valid=true)`` / ``color_preview(swatches, overflow_count=0)``: single swatch + aggregated swatch wrapper with "+N" tail. - ``nested_anndata_wrapper(inner)``: trusted Markup wrapper for nested AnnData repr fragments. - ``readme_icon(content, tooltip)``: ⓘ icon with data-readme attribute. - ``pre_fallback(text)``: <pre> block for the HTML-disabled fallback. - ``search_box(search_id)``: full hidden search input + toggles. New template ``x_entry.j2`` replaces the 30-line parts-list construction in ``render_x_entry`` with a state-dispatched template (ok / none / attribute_error / format_error). Migrated call sites (11): core.py (x_entry, formatted_entry error preview), registry.py (fallback error/warning previews), formatters.py (DataFrame column preview, categorical count fallbacks, color swatches + wrapper, nested AnnData wrapper), sections.py (raw nested wrapper, unknown sections now route through ``render_formatted_entry``), components.py (search box), html.py (README icon + <pre> fallback). ``environment.py`` gains CSS_COLORS, CSS_COLORS_SWATCH, CSS_COLORS_SWATCH_INVALID, CSS_DTYPE_UNKNOWN, CSS_NESTED_ANNDATA, and CSS_TEXT_WARNING as env globals so the macros can reference them by name. Side effect: the README icon migration fixes a pre-existing NUL-byte leak. The previous ``Markup(...).format(readme_content, tooltip_text)`` only HTML-escaped; NUL bytes flowed through into the ``data-readme`` attribute. ``_build_readme_icon`` now scrubs NULs explicitly before handing off to the macro.

…expose get_macros() Issue surfaced by review: the docstring examples in __init__.py and registry.py were teaching ecosystem authors the anti-pattern preview_markup=Markup(f'<span class="...">({obj.n_items} items)</span>') The f-string interpolates ``obj.n_items`` before Markup sees it, bypassing autoescape. Replaced every example (six sites across __init__.py, registry.py, and core.py) with the correct idioms: - preview=<str> (plain text, autoescaped) - preview_markup=Markup('<tag>{}</tag>').format(v) (standard MarkupSafe) - preview_markup=Markup(obj._repr_html_()) (reuse trusted HTML) - preview_markup=get_macros().my_macro(v) (Jinja macro path) Exposes ``get_macros()`` from ``anndata._repr`` so the fourth idiom is available to extension packages — the macro path is the only one that benefits from the engine's NUL-scrub finalize hook. Explicitly flags ``Markup(f'...{v}...')`` as the pattern to avoid in both the registry.py module docstring and the ``TypeFormatter`` class docstring. No behavior change. Nothing added to the public API beyond ``get_macros`` (already used internally by components.py / core.py).

str.join over a list of Markup returns plain str, which Jinja autoescapes when interpolated by render_section(entries=...). The result was every entry's HTML rendering as escaped text (<div>). Switch to Markup("\n").join(rows) in the six render_section call sites inside MockSpatialData so the demo teaches the right idiom.

# Conflicts: # src/anndata/_repr/html.py

katosh added 24 commits April 20, 2026 14:33

merge html_rep: pick up style commit + JS guard/test fix

fb29934

Merge branch 'html_rep' into jinja-markup-poc-2

177895d

# Conflicts: # src/anndata/_repr/html.py

katosh mentioned this pull request Apr 21, 2026

feat: Add HTML representation scverse/anndata#2236

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jinja + MarkupSafe adoption for AnnData._repr_html_#9

Jinja + MarkupSafe adoption for AnnData._repr_html_#9
katosh wants to merge 24 commits into
html_repfrom
jinja-markup-poc-2

katosh commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

katosh commented Apr 21, 2026

Jinja + MarkupSafe adoption for AnnData._repr_html_

Why

What changed, in two sentences

Architecture at a glance

Public API

FormattedOutput — renamed fields

Ecosystem formatters

Helper modules

Safety posture

What's enforced

Known MarkupSafe caveat: NUL bytes

Dependencies

Metrics

What this PR does not do

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jinja + MarkupSafe adoption for `AnnData._repr_html_`

`FormattedOutput` — renamed fields